12 research outputs found
An original framework for understanding human actions and body language by using deep neural networks
The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour.
By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way.
These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively.
While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements;
both are essential tasks in many computer vision applications, including event recognition, and video surveillance.
In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided.
The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements.
All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods
Adaptive bootstrapping management by keypoint clustering for background initialization
The availability of a background model that describes the scene is a prerequisite for many computer vision applications. In several situations, the model cannot be easily generated when the background contains some foreground objects (i.e., bootstrapping problem). In this letter, an Adaptive Bootstrapping Management (ABM) method, based on keypoint clustering, is proposed to model the background on video sequences acquired by mobile and static cameras. First, keypoints are detected on each frame by the A-KAZE feature extractor, then Density-Based Spatial Clustering of Application with Noise (DBSCAN) is used to find keypoint clusters. These clusters represent the candidate regions of foreground elements inside the scene. The ABM method manages the scene changes generated by foreground elements, both in the background model initialization, managing the bootstrapping problem, and in the background model updating. Moreover, it achieves good results with both mobile and static cameras and it requires a small number of frames to initialize the background model
Online separation of handwriting from freehand drawing using extreme learning machines
Online separation between handwriting and freehand drawing is still an active research area
in the field of sketch-based interfaces. In the last years, most approaches in this area have
been focused on the use of statistical separation methods, which have achieved significant
results in terms of performance. More recently, Machine Learning (ML) techniques have
proven to be even more effective by treating the separation problem like a classification task.
Despite this, also in the use of these techniques several aspects can be still considered open
problems, including: 1) the trade-off between separation performance and training time; 2)
the separation of handwriting from different types of freehand drawings. To address the
just reported drawbacks, in this paper a novel separation algorithm based on a set of original
features and an Extreme Learning Machine (ELM) is proposed. Extensive experiments
on a wide range of sketched schemes (i.e., text and graphical symbols), more numerous
than those usually tested in any key work of the current literature, have highlighted the
effectiveness of the proposed approach. Finally, measurements on accuracy and speed of
computation, during both training and testing stages, have shown that the ELM can be considered,
in this research area, the better choice even if compared with other popular ML
techniques
A keypoint-based method for background modeling and foreground detection using a PTZ camera
The automatic scene analysis is still a topic of great interest in computer vision due to the growing possibilities provided by the increasingly sophisticated optical cameras. The background modeling, including its initialization and its updating, is a crucial aspect that can play a main role in a wide range of application domains, such as vehicle tracking, person re-identification and object recognition. In any case, many challenges still remain partially unsolved, including camera movements (i.e., pan/tilt), scale changes (i.e., zoom-in/zoom-out) and deletion of the initial foreground elements from the background model. This paper describes a method for background modeling and foreground detection able to address all the mentioned challenges. In particular, the proposed method uses a spatio-temporal tracking of sets of keypoints to distinguish the background from the foreground. It analyses these sets by a grid strategy to estimate both camera movements and scale changes. The same sets are also used to construct a panoramic background model and to delete the possible initial foreground elements from it. Experiments carried out on some challenging videos from three different datasets (i.e., PBI, VOT and Airport MotionSeg) demonstrate the effectiveness of the method on PTZ cameras. Other videos from a further dataset (i.e., FBMS) have been used to measure the accuracy of the proposed method with respect to some key works of the current state-of-the-art. Finally, some videos from another dataset (i.e., SBI) have been used to test the method on stationary cameras
A Multipurpose Autonomous Robot for Target Recognition in Unknown Environments
In recent years, the technological improvements of
consumer robots, in terms of processing capacity and sensors,
are enabling an ever-increasing number of researchers to quickly
develop both scale prototypes and alternative low cost solutions.
In these contexts, a critical aspect is the design of ad-hoc
algorithms according to the features of the available hardware.
This paper proposes a prototype of an autonomous robot for
mapping unknown environments and recognizing target objects.
During the setup phase one or more target objects are shown to
the RGB camera of the robot which, for each of them, extracts
and stores a set of A-KAZE features. Afterwards, the robot
adopts the ultrasonic distance measurement and the RGB stream
to map the whole environment and search a set of A-KAZE
features matchable with those previously acquired. The paper
also reports both preliminary tests carried out on a reference
indoor environment and a case study performed in an outdoor
one that validate the proposed system
A new descriptor for Keypoint-Based background modeling
Background modeling is a preliminary task for many computer vision applications, describing static elements of a scene and isolating foreground ones. Defining a robust background model of uncontrolled environments is a current challenge since the model must manage many issues, e.g., moving cameras, dynamic background, bootstrapping, shadows, and illumination changes. Recently, methods based on keypoint clustering have shown remarkable robustness especially in bootstrapping and camera movements, highlighting however limitations in the analysis of dynamic background (i.e., trees blowing in the wind or gushing fountains). In this paper, an innovative combination between the RootSIFT descriptor and an average pooling is proposed in a keypoint clustering method for real-time background modeling and foreground detection. Compared to renowned descriptors, such as A-KAZE, this combination is invariant to small local changes in the scene, thus resulting more robust in dynamic background cases. Results, obtained on experiments carried out on two benchmark datasets, demonstrate how the proposed solution improves the previous keypoint-based models and overcomes several works of the current state-of-the-art